Nested parallelism: Allocation of threads to tasks and OpenMP implementation

نویسندگان

  • Ragnhild Blikberg
  • Tor Sørevik
چکیده

In this paper we discuss the use of nested parallelism. Our claim is that if the problem naturally possesses multiple levels of parallelism, then applying parallelism to all levels may significantly enhance the scalability of your algorithm. This claim is sustained by numerical experiments. We also discuss how to implement multi-level parallelism using OpenMP. We find current OpenMP implementation, based on version 1.0, to have severe limitation for implementing nested parallelization. We then show how this can be circumvented by explicitly assign task to threads. Load balancing issues become more complicated with two (or more) levels of parallelism. To handle this problem, we have designed a distribution algorithm which groups threads into teams, each team being responsible for one course grain outer-level task. This algorithm is proven to produce the optimal load balance, under given assumptions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Load balancing and OpenMP implementation of nested parallelism

Many problems have multiple layers of parallelism. The outer-level may consist of few and coarse-grained tasks. Next, each of these tasks may also be rich in parallelism, and be split into a number of fine-grained tasks, which again may consist of even finer subtasks, and so on. Here we argue and demonstrate by examples that utilizing multiple layers of parallelism may give much better scaling ...

متن کامل

Performance Evaluation of OpenMP Applications with Nested Parallelism

Many existing OpenMP systems do not su ciently implement nested parallelism. This is supposedly because nested parallelism is believed to require a signi cant implementation e ort, incur a large overhead, or lack applications. This paper demonstrates Omni/ST, a simple and e cient implementation of OpenMP nested parallelism using StackThreads/MP, which is a ne-grain thread library. Thanks to Sta...

متن کامل

Task-Based Execution of Nested OpenMP Loops

In this work we propose a novel technique to reduce the overheads related to nested parallel loops in OpenMP programs. In particular we show that in many cases it is possible to replace the code of a nested parallel-for loop with equivalent code that creates tasks instead of threads, thereby limiting parallelism levels while allowing more opportunities for runtime load balancing. In addition we...

متن کامل

NanosCompiler: supporting flexible multilevel parallelism exploitation in OpenMP

This paper describes the support provided by the NanosCompiler to nested parallelism in OpenMP. The NanosCompiler is a source-to-source parallelizing compiler implemented around a hierarchical internal program representation that captures the parallelism expressed by the user (through OpenMP directives and extensions) and the parallelism automatically discovered by the compiler through a detail...

متن کامل

Support and Efficiency of Nested Parallelism in OpenMP Implementations

Nested parallelism has been a major feature of OpenMP since its very beginnings. As a programming style, it provides an elegant solution for a wide class of parallel applications, with the potential to achieve substantial utilization of the available computational resources, in situations where outer-loop parallelism simply can not. Notwithstanding its significance, nested parallelism support w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Scientific Programming

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2001